AITopics | positional data

Collaborating Authors

positional data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

Neural Information Processing SystemsFeb-10-2026, 07:32:33 GMT

Data augmentation plays a key role in modern machine learning pipelines.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Vancouver (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Gastroenterology (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Representation Learning of Compositional Data

Marta Avalos, Richard Nock, Cheng Soon Ong, Julien Rouar, Ke Sun

Neural Information Processing SystemsNov-20-2025, 17:01:21 GMT

We consider the problem of learning a low dimensional representation for compositional data.

artificial intelligence, machine learning, positional data, (14 more...)

Neural Information Processing Systems

Country:

Europe > Finland > Paijanne Tavastia > Lahti (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Ohio (0.04)
(3 more...)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

Neural Information Processing SystemsAug-16-2025, 12:25:56 GMT

Data augmentation plays a key role in modern machine learning pipelines.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Vancouver (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Gastroenterology (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Learning curves theory for hierarchically compositional data with power-law distributed features

Cagnetta, Francesco, Kang, Hyunmo, Wyart, Matthieu

arXiv.org Machine LearningMay-13-2025

Recent theories suggest that Neural Scaling Laws arise whenever the task is linearly decomposed into power-law distributed units. Alternatively, scaling laws also emerge when data exhibit a hierarchically compositional structure, as is thought to occur in language and images. To unify these views, we consider classification and next-token prediction tasks based on probabilistic context-free grammars -- probabilistic models that generate data via a hierarchy of production rules. For classification, we show that having power-law distributed production rules results in a power-law learning curve with an exponent depending on the rules' distribution and a large multiplicative constant that depends on the hierarchical structure. By contrast, for next-token prediction, the distribution of production rules controls the local details of the learning curve, but not the exponent describing the large-scale behaviour.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2505.07067

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > Canada (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.81)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.81)

Add feedback

Debiased high-dimensional regression calibration for errors-in-variables log-contrast models

Zhao, Huali, Wang, Tianying

arXiv.org Machine LearningSep-11-2024

Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models that involve compositional covariates. This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data. We introduce a calibration approach tailored for the linear log-contrast model. Under relatively lenient conditions regarding the sparsity level of the parameter, we have established the asymptotic normality of the estimator for inference. Numerical experiments and an application in microbiome study have demonstrated the efficacy of our high-dimensional calibration strategy in minimizing bias and achieving the expected coverage rates for confidence intervals. Moreover, the potential application of our proposed methodology extends well beyond compositional data, suggesting its adaptability for a wide range of research contexts.

artificial intelligence, machine learning, measurement error, (17 more...)

arXiv.org Machine Learning

2409.07568

Country: North America > United States > Colorado (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

A Dirichlet stochastic block model for composition-weighted networks

Promskaia, Iuliia, O'Hagan, Adrian, Fop, Michael

arXiv.org Machine LearningAug-1-2024

Network data are observed in various applications where the individual entities of the system interact with or are connected to each other, and often these interactions are defined by their associated strength or importance. Clustering is a common task in network analysis that involves finding groups of nodes displaying similarities in the way they interact with the rest of the network. However, most clustering methods use the strengths of connections between entities in their original form, ignoring the possible differences in the capacities of individual nodes to send or receive edges. This often leads to clustering solutions that are heavily influenced by the nodes' capacities. One way to overcome this is to analyse the strengths of connections in relative rather than absolute terms, expressing each edge weight as a proportion of the sending (or receiving) capacity of the respective node. This, however, induces additional modelling constraints that most existing clustering methods are not designed to handle. In this work we propose a stochastic block model for composition-weighted networks based on direct modelling of compositional weight vectors using a Dirichlet mixture, with the parameters determined by the cluster labels of the sender and the receiver nodes. Inference is implemented via an extension of the classification expectation-maximisation algorithm that uses a working independence assumption, expressing the complete data likelihood of each node of the network as a function of fixed cluster labels of the remaining nodes. A model selection criterion is derived to aid the choice of the number of clusters. The model is validated using simulation studies, and showcased on network data from the Erasmus exchange program and a bike sharing network for the city of London.

algorithm, node, stochastic block model, (15 more...)

arXiv.org Machine Learning

2408.00651

Country:

Europe > Germany (0.05)
Europe > France (0.05)
Europe > Spain (0.04)
(30 more...)

Genre: Research Report (0.50)

Industry:

Education (1.00)
Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Zero-Inflated Tweedie Boosted Trees with CatBoost for Insurance Loss Analytics

So, Banghee, Valdez, Emiliano A.

arXiv.org Machine LearningJun-23-2024

In this paper, we explore advanced modifications to the Tweedie regression model in order to address its limitations in modeling aggregate claims for various types of insurance such as automobile, health, and liability. Traditional Tweedie models, while effective in capturing the probability and magnitude of claims, usually fall short in accurately representing the large incidence of zero claims. Our recommended approach involves a refined modeling of the zero-claim process, together with the integration of boosting methods in order to help leverage an iterative process to enhance predictive accuracy. Despite the inherent slowdown in learning algorithms due to this iteration, several efficient implementation techniques that also help precise tuning of parameter like XGBoost, LightGBM, and CatBoost have emerged. Nonetheless, we chose to utilize CatBoost, a efficient boosting approach that effectively handles categorical and other special types of data. The core contribution of our paper is the assembly of separate modeling for zero claims and the application of tree-based boosting ensemble methods within a CatBoost framework, assuming that the inflated probability of zero is a function of the mean parameter. The efficacy of our enhanced Tweedie model is demonstrated through the application of an insurance telematics dataset, which presents the additional complexity of compositional feature variables. Our modeling results reveal a marked improvement in model performance, showcasing its potential to deliver more accurate predictions suitable for insurance claim analytics.

catboost, claim amount, loss function, (17 more...)

arXiv.org Machine Learning

2406.16206

Country:

Oceania > Australia (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Banking & Finance > Insurance (1.00)
Transportation > Ground > Road (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models

He, Qianyu, Zeng, Jie, He, Qianxi, Liang, Jiaqing, Xiao, Yanghua

arXiv.org Artificial IntelligenceJun-18-2024

It is imperative for Large language models (LLMs) to follow instructions with elaborate requirements (i.e. Complex Instructions Following). Yet, it remains under-explored how to enhance the ability of LLMs to follow complex instructions with multiple constraints. To bridge the gap, we initially study what training data is effective in enhancing complex constraints following abilities. We found that training LLMs with instructions containing multiple constraints enhances their understanding of complex instructions, especially those with lower complexity levels. The improvement can even generalize to compositions of out-of-domain constraints. Additionally, we further propose methods addressing how to obtain and utilize the effective training data. Finally, we conduct extensive experiments to prove the effectiveness of our methods in terms of overall performance and training efficiency. We also demonstrate that our methods improve models' ability to follow instructions generally and generalize effectively across out-of-domain, in-domain, and adversarial settings, while maintaining general capabilities.

constraint, instruction, training data, (14 more...)

arXiv.org Artificial Intelligence

2404.15846

Country:

Europe > Andorra (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Industry:

Education (0.68)
Health & Medicine > Therapeutic Area > Endocrinology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)

Add feedback

Offline robot programming assisted by task demonstration: an AutomationML interoperable solution for glass adhesive application and welding

Babcinschi, M., Cruz, F., Duarte, N., Santos, S., Alves, S., Neto, P.

arXiv.org Artificial IntelligenceMay-21-2024

Robots have been successfully deployed in both traditional and novel manufacturing processes. However, they are still difficult to program by non-experts, which limits their accessibility to a wider range of potential users. Programming robots requires expertise in both robotics and the specific manufacturing process in which they are applied. Robot programs created offline often lack parameters that represent relevant manufacturing skills when executing a specific task. These skills encompass aspects like robot orientation and velocity. This paper introduces an intuitive robot programming system designed to capture manufacturing skills from task demonstrations performed by skilled workers. Demonstration data, including orientations and velocities of the working paths, are acquired using a magnetic tracking system fixed to the tools used by the worker. Positional data are extracted from CAD/CAM. Robot path poses are transformed into Cartesian space and validated in simulation, subsequently leading to the generation of robot programs. PathML, an AutomationML-based syntax, integrates robot and manufacturing data across the heterogeneous elements and stages of the manufacturing systems considered. Experiments conducted on the glass adhesive application and welding processes showcased the intuitive nature of the system, with path errors falling within the functional tolerance range.

adhesive application, demonstration, glass adhesive application, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1080/0951192X.2024.2358042

2405.13141

Country: Europe > Portugal > Coimbra > Coimbra (0.04)

Genre: Research Report (1.00)

Industry: Materials > Chemicals > Specialty Chemicals (0.74)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Hierarchical mixture of discriminative Generalized Dirichlet classifiers

Togban, Elvis, Ziou, Djemel

arXiv.org Machine LearningMay-2-2024

This paper presents a discriminative classifier for compositional data. This classifier is based on the posterior distribution of the Generalized Dirichlet which is the discriminative counterpart of Generalized Dirichlet mixture model. Moreover, following the mixture of experts paradigm, we proposed a hierarchical mixture of this classifier. In order to learn the models parameters, we use a variational approximation by deriving an upper-bound for the Generalized Dirichlet mixture. To the best of our knownledge, this is the first time this bound is proposed in the literature. Experimental results are presented for spam detection and color space identification.

classifier, dataset, hmgd, (14 more...)

arXiv.org Machine Learning

2405.01778

Country:

North America > Canada > Quebec > Estrie Region > Sherbrooke (0.28)
Asia > Middle East > Jordan (0.04)
Africa > Middle East > Algeria > Annaba Province > Annaba (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback